Improved Information Gain Estimates for Decision Tree Induction
نویسنده
چکیده
Ensembles of classification and regression trees remain popular machine learning methods because they define flexible nonparametric models that predict well and are computationally efficient both during training and testing. During induction of decision trees one aims to find predicates that are maximally informative about the prediction target. To select good predicates most approaches estimate an informationtheoretic scoring function, the information gain, both for classification and regression problems. We point out that the common estimation procedures are biased and show that by replacing them with improved estimators of the discrete and the differential entropy we can obtain better decision trees. In effect our modifications yield improved predictive performance and are simple to implement in any decision tree code.
منابع مشابه
Comparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کاملDIAGNOSIS OF BREAST LESIONS USING THE LOCAL CHAN-VESE MODEL, HIERARCHICAL FUZZY PARTITIONING AND FUZZY DECISION TREE INDUCTION
Breast cancer is one of the leading causes of death among women. Mammography remains today the best technology to detect breast cancer, early and efficiently, to distinguish between benign and malignant diseases. Several techniques in image processing and analysis have been developed to address this problem. In this paper, we propose a new solution to the problem of computer aided detection and...
متن کاملConstructing a Fuzzy Decision Tree by Integrating Fuzzy Sets and Entropy
Decision tree induction is one of common approaches for extracting knowledge from a sets of feature-based examples. In real world, many data occurred in a fuzzy and uncertain form. The decision tree must able to deal with such fuzzy data. This paper presents a tree construction procedure to build a fuzzy decision tree from a collection of fuzzy data by integrating fuzzy set theory and entropy. ...
متن کاملFuzzy Decision Tree Induction Approach for Mining Fuzzy Association Rules
Decision Tree Induction (DTI), one of the Data Mining classification methods, is used in this research for predictive problem solving in analyzing patient medical track records. In this paper, we extend the concept of DTI dealing with meaningful fuzzy labels in order to express human knowledge for mining fuzzy association rules. Meaningful fuzzy labels (using fuzzy sets) can be defined for each...
متن کاملImproved surname pronunciations using decision trees
Proper noun pronuncia t ion genera t ion is a part icular ly chal lenging problem in speech recognition since a large percentage of proper nouns often defy typical letter-to-sound conversion rules. In this paper, we present decision tree methods which outperform neural network techniques. Using the decision tree method, we have achieved an overall error rate of 45.5%, which is a 35% reduction o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1206.4620 شماره
صفحات -
تاریخ انتشار 2012